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Abstract: The rapidly increasing complexity of (mainly wireless) ad-hoc networks stresses the 
need of reliable distributed estimation of several variables of interest. The widely used centralized 
approach, in which the network nodes communicate their data with a single specialized point, 
suffers from high communication overheads and represents a potentially dangerous concept with 
a single point of failure needing special treatment. This paper's aim is to contribute to another 
quite recent method called diffusion estimation. By decentralizing the operating environment, 
the network nodes communicate just within a close neighbourhood. We adopt the Bayesian 
framework to modelling and estimation, which, unlike the traditional approaches, abstracts 
from a particular model case. This leads to a very scalable and universal method, applicable to 
a wide class of different models. A particularly interesting case - the Gaussian regressive model 
- is derived as an example. 

Keywords: Regressive models; Distributed models; Model; Parameter estimation; Regression. 
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1. INTRODUCTION 

We deal with the problem of collaborative estimation of 
unknown environmental parameter from noisy measure- 
ments. It naturally arises, e.g., in modern complex wire- 
less systems and distributed sensor networks [Aysal and 
Barncr, 2008]. There exist two principal design schemes 
how to treat this estimation task: (i) the centralized ap- 
proach, where the data are transmitted to a designated 
processing center (sometimes called fusion center) respon- 
sible for estimation (e.g., Aysal and Barner, 2008 and 
many others); and (ii) the decentralized concept, where 
the nodes are responsible for estimation (e.g., Xiao et al. 
[2006], Cattivelli and Sayed [2010]). The decentralized 
methods become very promising, since the increasing com- 
plexity of modern networks calls for approaches with low 
overheads with respect to the time, energy and communi- 
cation resources. Besides that, the potential single-points 
of failure (SPOFs) are principally avoided and a good 
design of the algorithm allows fast spatial reconfigurations 
of the network. 

There exist several Bayesian methods treating general 
tasks with distributed character from the decision making 
perspective, ranging from [Tsitsiklis and Athans, 1982] 
to [Aysal and Barner, 2008]. We focus ourselves on a 
recently formulated diffusion estimation problem, i.e., fully 
decentralized collaborative estimation in networks allow- 
ing the nodes to communicate only with their adjacent 
neighbours. In this field, a couple of non-Bayesian estima- 
tion algorithms were proposed. However, these are mostly 
single problem oriented, e.g., on least-squares estimation 
[Xiao et al., 2006], recursive least-squares (RLS, Cattiv- 
elli ct al. [2008]), least mean squares (LMS, Lopes and 



Sayed [2008], Cattivelli and Sayed [2010]), Kalman filters 
(Cattivelli et al. [2008]) etc. We propose a new method 
called dynamic Bayesian diffusion estimation, which tack- 
les the problem from the consistent and versatile Bayesian 
viewpoint and yields rather a methodology applicable to 
a much wider class of models, including, of course, the 
mentioned traditional ones. A particularly interesting ap- 
plication of the method to a Gaussian linear regressive 
model results in the so-called diffusion recursive least- 
squares method, proposed in Cattivelli et al. [2008]. This 
demonstrates the generality of the method and advocates 
its feasibility. Furthermore, it shows that it is possible to 
shift from the viewpoint of a Bayesian statistician to the 
traditionalist's one, disregarding the probabilistic treat- 
ment of parameters of interest. 

In this paper, we implicitly assume that the communica- 
tion among nodes does not violate the bandwidth or other 
restrictions. The cases of restricted networks would require 
a specific solution which is behind the scope of this paper. 

The organization of the paper is as follows: In Section 2, 
we briefly introduce the basic principle of Bayesian estima- 
tion. In Section 3, the dynamic Bayesian diffusion estima- 
tion theory is developed. Its application to the Gaussian 
linear regressive model follows in Section 4. Since we show 
that it leads to an existing solution, a demonstration 
example is avoided in the paper. We conclude our work 
and outline the future research topics in Section 6. 

2. BAYESIAN ESTIMATION 

Let us consider a linear stochastic system with a real 
input variable u t and a real output variable y t , observed at 
discrete time instants t = 1,2,... Both u t and y t can be 



scalar or multivariate. We form a data d(t) as an ordered 
set of observations and inputs, d(t) = {tjq, uq, . . . ,y t , Ut}- 
The dependence of the output y t on the previous data 
d{t — 1) and the current input Ut can be modelled by a 
conditional probability density function (pdf) 

f(y t \u t ,d(t-l),@), (1) 

where is a random potentially multivariate model pa- 
rameter. 

The Bayesian methodology treats the model parameter as 
an unobservable random variable whose knowledge at time 
t is carried by past data d(t — 1). The Bayesian estimation 
of then exploits pdf g(&\d(t — 1)). By the assumption 
of natural conditions of control [Peterka, 1981] we have 

g(&\u t ,d(t-l))=g(®\d(t-l)), (2) 

i.e., the information about parameter at time t is con- 
ditionally independent of the current input Ut- The prior 
knowledge d(0) = {yo,Uo} formed by the initial data can 
be determined by an expert or it follows from past esti- 
mation. It is also possible to start from a noninformative 
(flat) prior pdf. 

The Bayesian recursive estimation exploits the Bayes' rule 
to incorporate new data into the prior pdf of as follows 

g(&\d(t)) cx f(y t \u tl d(t - 1), 0)<?(0|d(f - 1)), (3) 

where oc denotes equality up to a normalizing constant. At 
the next time instant, the posterior pdf on the left-hand 
side of (3) is used as the prior pdf. The last relation is also 
known as the dynamic Bayesian data update. 

3. DYNAMIC BAYESIAN DIFFUSION ESTIMATION 

Let us now focus on the diffusion estimation task. Let 
there be a distributed network consisting of a set of 
nodes interacting with their neighbours, which collectively 
estimate the common parameter of interest using the same 
model structure. Furthermore, let us impose the following 
constraint: the nodes are able to communicate one-to-one 
only within their closed neighbourhood defined as follows: 

Definition 1. Given a network represented by an undi- 
rected graph consisting of M € N nodes, the closed neigh- 
bourhood A/fc of the fcth node, 1 < k < M, is the set 
consisting of its adjacent nodes and node fc. 

An example of a network including a closed neighbourhood 
A/i = {1, 2, 3, 5} of node k — 1 is depicted in Figure 1. 




Fig. 1. Closed neighbourhood Ai = {1,2,3,5}. 

The diffusion estimation involves two subsequent steps, the 
former of which is optional but preferred: 

Incremental update - also known as the data update, 
is a diffusion alternative of (3). The nodes propagate 
data within their closed neighbourhood and incorporate 
them into their local statistical knowledge; 



Spatial update - the nodes propagate point parameter 
estimates (i.e. mean values) or posterior pdfs within 
their closed neighbourhood and correct their local es- 
timates. 



N 




Fig. 2. Incremental update of node k = 1 by data from its 
adjacent neighbours I G A4- The spatial update looks 
similarly, the nodes exchange either whole pdfs (i.e., 
the hyperparameters) of or its estimates. 



3.1 Incremental update 

First, we develop the general theory of the incremental 
update using the Bayesian decision making paradigm. Let 
A be a measurable space of decisions, j3 = dim(0) and let 
L : R' 3 x A — > R be an Li-measurable loss function. The 
Bayesian decision making problem consists of choosing 
a E A by using a measurable decision rule <5 : R —> A 
after an observation of random variable X being obtained. 
Therefore, we introduce the risk 

R(&,6)=E X [L(&,S(X))\& = 9] (4) 
and the Bayesian risk function 

p(g,5)=E & [R(®,6)] (5) 
measuring the quality of a decision rule S under ignorance 
of a parameter with prior 5(0). The Bayes' rule is that 
one which satisfies the condition 

E© [L(&,S{X))\X =x] = inf E© [L(&,a)\X = x] (6) 

a£ A 

where the integration is with respect to the posterior pdf 
of 0. 

Consider now the situation from the fcth node's perspec- 
tive, exploiting the data from its closed neighbourhood. 
In [Stone, 1977], for any given a and weights c;^ (where 
I G A4), the approximate of the Bayesian inference under 
ignorance of the prior distribution was proposed in terms 
of 

E[L k (@,a)\X = x}= J2 c lM®,a). (7) 

Namely, represents weight of Zth node with respect to 
the fcth one and X]ze./V fc Cl , k = 

Remind, that the Bayes' rule transforming the prior pdf 
to the posterior pdf is completely compatible with the 
maximum entropy principle [Giffin and Caticha, 2007], 
hence we only need to reflect the fact that for a fixed 
time, multiple data are at disposal. To stay in the entropy 
framework, we will exploit the minimum cross-entropy 
principle (MinXEnt) to find a rule for handling the data. 

Definition 2. (Kullback Leibler divergence). 
Let /, j be two pdfs describing random variable X. 
The Kullback-Leibler divergence (also known as the cross- 
entropy) of / and g is defined as 



T>U\\9) = J f{x)]og^dx 

= J f( x ) lo S f(x)dx - J f(x) log g(x)da 



(8) 



= H(f,g)-H(f) 
where H(-) denotes entropy. 

Corollary 3. Given /, the minimization of the Kullback- 
Leibler divergence T>(f\\g) is equivalent to the minimiza- 
tion of H(f, g). 

Proof. Trivial. 

Instead of operating on nodes' posterior pdfs using a sort 
of averaging or projection, e.g. [Karny et al., 2006], we 
propose to exploit the principle of weighted likelihoods 
[Wang, 2004, 2006]. Let /(x|0) and f(x\a) denote con- 
ditional pdfs with respect to © and a respectively. The 
Bayesian framework assigns 2?(/(x|0)| |/(x|a)) = L(&,a). 
Under k fixed, (7) reads 

E [Lfe(0, a)\xk] 

where x; denotes data from Ith node. Since we have just 
one observation for each node I € Nk , we get 

E[L fe (0,a)|x fe ] = V d.kMx^log^p^ 

Under ignorance of © we set, accordingly to maximum 
entropy principle, //(x;|0) = l/card(jV/ s ) where card 
denotes set cardinality. Formula (9) then looks as follows: 



(9) 



card(A/fc) 



(10) 

We see that only the second part of (10) should be 
considered for the minimization "through" the set A of 
possible decisions. Particularly: 

arg min - V c Lk log fi(x t \a) 
aeA \ ieM k J 

= arg max ^ c i,k log fi(xi\a) 



= arg max J| fi(xi\a) c, - k , 



aeA 



(11) 



where ci t k denote the previously mentioned weights. The 
argument (11) together with the Bayes' rule (3), preserving 
entropy maximization, yield theoretically consistent incre- 
mental update in the form 

<7k(e|d(t))a0fc(e|d(t-i)) 

x n Myuh^d^t-i),®)^, (12) 

ieN k 

where d(t) stands for all data available from sources in A4- 
3.2 Spatial update 

The spatial update follows after the incremental update. In 
this step, the nodes exchange information about unknown 
model parameter 0, either in the form of its estimates or 



hyperparameters of its distribution. Formally, for fixed fc, 
the information from all nodes in A4 describes the finite 
mixture density 



^ ai, k gi(e\d(t)), 



^ ai.k = 1, (13) 

ieAf k 



g k (@\d(t)) = ^ 

ieH k 

where < ai t k < 1 is the weight of Ith node's estimate 
from fcth node's viewpoint. 

Here, two possible departure points arise. First, more gen- 
erally, we may be interested in a "consensus" distribution, 
i.e., a single distribution best representing the mixture (13) 
at node k. Its pdf can be found as the argument minimizing 
the Kullback-Leibler divergence, 



arg_min V I g k {&\d{t)) 
g k (&\d(t))eg V 

where Q is the class of all admissible pdfs. 



g k (&\d(t)) 



(14) 



The second possibility emerges if we are interested just 
in the moment(s) available from gk(®\d(t)). Then, e.g., 
the first moment (the mean value) is given by the convex 
combination of mean values of the mixture density com- 
ponents, 

fc = a ^®i- (15) 
ieAf k 

For other moments see, e.g., Friihwirth-Schnatter [2006]. 
The latter approach is of particular interest if the distri- 
bution is parameterized by moments (e.g., the Gaussian 
distribution). Another appealing fact related to these dis- 
tributions is that (15) is often a direct consequence of (14). 
In these cases, it is possible to omit the Kullback-Leibler 
divergence minimization and benefit directly from (15). 
While (15) is a final product at time t, the pdf resulting 
from (14) can be reused as the fc's prior pdf at the next 
time step. 

Properties of the diffusion estimator strongly depend on 
the underlaying particular estimators in a neighbourhood 
and their weights a^k and c^k- In this respect, the need 
for effective determination of weights is essential. 



3.3 Determination of weights a^k and c^k 

There are several possible strategies how to determine 
the weights ai y k and c;^. Besides the relatively unfeasible 
uniform weights, the user can perform with the aid of 
Metropolis weights, proposed by Xiao et al. [2006] and 
further used in recent literature. Another options are rel- 
ative degree and yet more sophisticated relative degree- 
variance weights, based on the cardinality of the node's 
closed neighbourhood, [Cattivelli and Sayed, 2010]. We 
only conjecture that a suitable probabilistic method ex- 
ploiting, e.g., the likelihood of Zth data with respect to 
fcth node could be found as well. A substantial advan- 
tage of such method would be its suitability for dynamic 
cases, requiring stable determination of a^k and ci.k- As 
a consequence, it would allow to suppress the influence 
of data and/or estimates from a failing node (sensor) on 
other nodes. However, such methods are being developed 
in the meantime. 



4. DERIVATION FOR GAUSSIAN REGRESSIVE 
MODEL 

In this section, a practical application of the proposed 
methodology is given. We derive the dynamic Bayesian dif- 
fusion estimator of the popular Gaussian linear regressive 
model. In two following subsections, we shortly present the 
standard Bayesian estimation of such model and develop 
its diffusion estimator. This case is just one example of a 
wide class of possible models, the applicability on which 
is straightforward. This class includes particularly popular 
Bayesian models with conjugate priors. 

4-1 Gaussian linear regressive model 

Given a regression vector ip t e R n ,t — 1,2,... and a 
dependent random variable y t € K, the Gaussian linear 
regressive model takes the form 

y t = xPj6 + £ t , (16) 

where € K™ is the regression coefficient and St ~ M(0, a 2 ) 
is the Gaussian white noise. This makes yt ~ M(ipj0, a 2 ) 
and the regression model (16) can be expressed by pdf 
f(y t \ipt,®)- From the Bayesian viewpoint, the model pa- 
rameters = {0, a 2 } are also random variables. Under 
ignorance of their values, the proper conjugate prior distri- 
bution is the normal inverse-gamma (Mir) one [Bernardo 
and Smith, 1994]. Namely, is normal and a 2 is inverse- 
gamma. 

Definition 4- (Normal inverse-gamma pdf). 
For a variable = {0,cr 2 }, E R" and 
the normal inverse-gamma Mir(V ', v) pdf with a sym- 
metric positive definite extended information matrix 



a 2 G 



v e 



»NxN 



,N = n + 1 and the degrees of freedom v e 



has the form 



g(0,a 2 \V,v) = 



7 -{u+n+l) 



-exp<- 



2a 2 



-i T 



6 



where !(■) is the normalization term such that 



g(0,a 2 \V,v)d® = l. 



Both V and v are sufficient statistics [Bernardo and Smith, 
1994] representing data d(t-l) = {y t -i,ip t -i, ■ ■ -,yo,4>o}- 
The Bayesian recursive estimation (3) updates the prior 
pdf by new data according to the following theorem. 

Theorem 5. (Bayesian estimation of a Mir model) . 
Let g(0,a 2 \V,v) be a Mir pdf, t = 1,2, ... The Bayesian 
estimation (3) updates the sufficient statistics V £ R NxN 
and v e R by real scalar realization y t and regression 



vector ip t e 



piV-l 



as follows: 



Vt 




Vt 









Vt = Vt-\ + 1 



(17) 
(18) 

1 of regression 



The multivariate point estimator t G 
coefficient is the mean value of the Mir distribution given 

by 



t 



>22 • 


■ V~2N 


-l 


"V21" 


Vn2 ■ 


■ Vnn 


t 





(19) 



Proof. The update of statistics V and v follows directly 
from multiplication of Gaussian models (likelihoods), see, 
e.g., Peterka [1981]. The point estimator is the well-known 
ordinary least squares estimator. 

4-2 Diffusion estimation of the Bayesian regressive model 

In order to derive the dynamic Bayesian diffusion estima- 
tor of 0, we follow the principles given in Section 3. Let 
us consider a network of M e N distributed nodes. Each 
node k € {1, . . . , M} evaluates a model 

f{yk;t\lpk;t,&,V k; t-l,Vk;t-l) (20) 

and runs the diffusion Bayesian estimation (12) of its 
parameters in the form 

9k(®\Vk-U Vk;t) OC 5fe(0| Vk-t-l, V k ;t-l) 

x n MyiMht,®^-!,^-!) ^. (21) 

Here < < 1 weights Zth node's data with respect 
to fcth node, I € M k , where J2ieN c i-k = 1- Simply put, 
the fcth node updates its prior pdf of by data from its 
closed neighbourhood M k - Since we deal with the Mir 
pdf, this update takes the form expressed by the following 
proposition. 

Proposition 6. (Incremental update of Mir pdf). Given a 
fcth node, fc G {1, . . . , M}, the incremental version of the 
Bayesian estimation (Theorem 5) updates the fcth node's 
prior Mir pdf of © by data [yi- u i/'z,t] T , weighted by c Lk , 



from its adjacent neighbours I 
following rules: 



Vk;t — V k -t-l 



ci - k 



V k -t — V k ;t-1 + E 



G M k according to the 

(22) 
(23) 



Vl;t 




Vl;t 


_1pl;t_ 







where 



< ct, k < 1, 



^2 Q < fe = 1 > 

ieM k 



leM k . 



Proof. Let k = card(A4)- The formula (22) following 
from (21) is equivalent to k updates (17) of V ktt -i by data 
[yi,t, '0i,t] T weighted by ci ik . Formula (23) is a direct equiv- 
alent of (18). □ 

In linear regression, we are particularly interested in point 
estimation of the regression coefficient 0. 

Proposition 7. (Spatial update of 0). Given a fcth node, 
fc € {1, . . . , M}. The spatial update (15) of the estimate 
k . t has the form 

k ,t = a l,k6l;U (24) 

where 

< a/,fc < 1, ai > k = 1 - 

ai ik denotes the weight of Ith node's point estimate with 
respect to fcth node. 



Proof. This is a straightforward use of (15). 



□ 



Similar procedure applies to estimation of a 2 . The sum- 
mary of the derived steps is in Algorithm 1. 



5. DYNAMIC BAYESIAN DIFFUSION REGRESSIVE 
MODEL AND RLS 

Let us demonstrate the simplicity of transition from the 
dynamic Bayesian diffusion estimation to its non-Bayesian 
counterpart. For simplicity, consider y scalar and partition 
the extended information matrix V as follows: 



V 



Vy 




V y<p 





where V y E 



(25) 



Furthermore, let us denote C 



and sec, how the 



update - Proposition 6 - performs on rcparameterized 

Afir P df. 

Proposition 8. (Reparametrization of Afir pdf) . 
Given pdf Afir(V,v) of = {9, a 2 }. The statistic 
V E ~R NxN can be decomposed into the lower-dimensional 
statistics C E R nxn ,9 E R™ and Ael where n = N — 1, 
yielding the reparametrized pdf AfiT(C, 9, A, v) as follows: 

-(v+n+l) 



g(9,a 2 \C,9,A,v) = 



u 



x cxp 



1 

2a 2 



x(c,e,A,u) 

(9-9) T C- 1 (0-9) 



A 



where 



9 — CVyfi, 
A = Vy- V^CVy* 



(26) 

(27) 
(28) 



and where l(C,9,A,v) is the normalization term such 
that 



J g(9,a 2 \C,9,A,u)d& 
Proof. By completion of squares 



1. 





T 


' v y v y y 






9 








9 



Vy-29 T V y ^+9 T V^9 



(9 - CV y ^) T C- 1 (9 - CV y ^) + (V y - V^CVy; 



□ 



Now, we focus on the recursive update of fcth node's repa- 
rameterized Afir pdf statistics. First note, that the right- 
hand side of formula (22) can be viewed as a sequential 
(one-by-one) update of fcth nodes' 14, t by data [yi- t , V^tF 
with weights where I E Afk- This means, that when the 
transition (t — 1) — > t occurs, the assignment 

V k . t := V k - t -i (29) 
is made, followed by the updates 

T 



Vfc ;t 4- Vk-t + Cl.k 



Ul;t 




Vl;t 









for all lEAfk- (30) 



Therefore, we can take advantage of deriving the update 
of fcth reparameterized pdf by data from ?th node. The 
rcparameterized equivalent of (22) then results from (30) 
for all I E Afk and t fixed. This sequential update procedure 
describes the following proposition. 

Proposition 9. (Update of reparameterized Afir pdf). 
Given a pdf g{9, o 2 \C , 9, A, v) of fcth node at fixed time t. 
After initialization 

Cfc,t := Ck,t-1, Ok;t : = #fc;t-l> 

Afc ; t := Afc ;t _i, Vk-,t ■= Vk;t-i, (31) 



the update by data yi- t , ipt ; t, weighted by ci :k for all I E Afk 
reads 

Cl,kCk-t^l;t^J. t Ck;t 



Ck;t ^ Cky, 



1 + Ci^J. t Ck;t^l;t 



(32) 



e k; t «" Ok-, + r-^%#^ [yi;t ~ V>*!A;t] (33) 

{ci.kyi-t + Cl,k^k;t^k;tj 



Afc-i «- Afc.j + 



1 + cu k ^J. t Ck-tip. 



7:1 



I'M «~ ^fe:t + Q.fc 



(34) 
(35) 



Proof. Fix t and rewrite the update of blocks of Vk-t 
of fcth node by yi- t and tpi ; t from its adjacent neighbour 
I E Afk- The initialization (31) is equivalent to 

Vk-t <- Vk-t-i, Vk-t 4- Vk-t-i- 



The blocks of Vk-t are updated as follows: 

Vk;y;t <~ Vk-y-t + Cf,feJ/j ;t 

Vfc;V>;i ^~ Vfc;<A;t + Ci^l-t^J-t 
Vk-y^t Vk- y ^-t + Q,k1pl;tyi;t 

Notice, that (37) is equivalent to 



(36) 
(37) 
(38) 

(39) 



By application of the Sherman-Morrison formula, Propo- 
sition 10 in Appendix, we obtain 

CLkCk-t^l-t^J. t Ck;t 

Isfc-t ^ — ^k't — Fp 5 

1 + C Lk 1pi. t Ck;t1pl;t 

which proves (32). 

The substitution of (32) and (38) into (27) yields 



0fc;t ^- Cfe;t - 



c^kCk-ti'i-ttPj.tCk-fi 

1 + c^ k tpJ. t Ck-t^i-t 



(Vfc;^;* + ci,kfpl-,tyi;t) 



„ . „ . cikCk-tipi-tip^Ck-t 

<~ ^k-t Vk-.yip-.t + Ci,k^k;tWl;tyi-t - — 77p— : 

1 + Cl t ktpi. t C k; t1pl;t 



X Vk-y^-t 

<- 9 k :t + 
«- 0fc;t + 



1 + '•/.„»/•;', C'A:,^ 



ci,ktpi-tyi--, 



l-t 

Cl,kCk;ttpl;t r , T/ -f 1 
TV^ I^fe ~ Wl-t^k-tVk-y^-A 



i + ci^i^Ck-t^m 

Cl,kCk;ttpl:t 



1 + Cl,ktpl. t C k -ttpl;t 

proving (33). 

Similarly obtained Formula for A: 

Afc ;t «- Vfc ;1/;t + C (j fc2/; 2 ;t - (Vfe ; ^ ;t + C^ktpl:tyi;t) T 

( C^kCk-t1pl-t^ t Ck;t\ 
X |Cfc;t~ - ; - _„, Tr ,; „ ; . ){Vk-y^;t + Cl.klpl-tyi-t) 



1 + Ci,feVi H C k;t^, 



l;t 



Afe-i + 



(cLkVl-t + Cl,kll>k- t Qk;t 

1 + ci^J. t Ck-t^h 



proves (34). Finally, the fact that 



^2 C l,k = 1 

proves (35). □ 

Obviously, since sum to unity, it is sufficient to 
increment Vk;t at each time step by 1. 

The well-known recursive least-squares evaluate a co- 
variance matrix and the regression coefficients estimates, 
which is the same as C and in the reparameterized 
Mir pdf. In this respect, the dynamic Bayesian diffusion 
estimation of the Bayesian regressive model is completely 
equivalent to the diffusion (unweighted) RLS, cf. Cattivclli 
et al. [2008]. This proves the feasibility of the method. 
However, the exploited probabilistic framework allows to 
use the very general principles given in Section 3 with a 
wider class of various models. 

Algorithm 1: Diffusion Bayesian regressive model 

Initialization: 

forall the k e {1, . . . , M} do 

Set prior statistics Vk-fi and fk-o- 
Set weights and ai t k, I € Nk- 
end 

Online steps: 
for t = 1, 2, ... do 

Incremental update: 

forall the k € {1, ... , M} do 

Gather data [yi-t,tpi-,t] T for all I € Nk- 
Perform the updates of Vk,t—i, ^fc;t-i, Prop. 6. 

Calculate point estimates Ok\t, Prop. 5. 
end 

Spatial update: 

forall the k € {1, ... , M } do 

Gather point estimates 9i t for all I G Nk- 

Perform the update of dk-t, Prop. 7. 
end 
end 



6. CONCLUSIONS 

The dynamic Bayesian diffusion estimation methodology 
provides a way to solving the decentralized estimation 
problems in the modern complex distributed systems, e.g., 
the sensor and ad-hoc networks. The theoretical aspects of 
the method are advocated by the maximum entropy and 
minimum cross-entropy principles. Being developed in the 
Bayesian framework, it is directly applicable to a wide class 
of different models. As a special case, the application of 
the methodology to the dynamic Bayesian linear regression 
yields particularly useful diffusion recursive least squares. 
This aspect also supports the assumption of validity of the 
method. In addition, it demonstrates that for practical 
purposes it is possible to leave the distribution-oriented 
perspective in favor of the traditional non-Bayesian rea- 
soning. 

The foreseen research activities comprise, among others, 
the analysis of properties of the diffusion estimator, the 
Bayesian estimation under specific constraints related, 
e.g., to bandwidth etc. Also, a probabilistic method for 



dynamic determination of the weighting coefficients a^k 
and ci t k is of particular interest. 

7. APPENDIX 

Proposition 10. (Sherman-Morrison formula). 

Let A £ M. nxn be an invertible matrix and u, v £ M. n two 

vectors. Then, the following equality holds, 

i a t\-! a-1 A~ 1 uv T A~ 1 
(A + uv 1 ) = A — - . 



Proof. Trivial. 



□ 
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